Optimizing the ensemble for equilibration in broad-histogram Monte Carlo simulations 
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We present an adaptive algorithm which optimizes the statistical-mechanical ensemble in a gener- 
alized broad-histogram Monte Carlo simulation to maximize the system's rate of round trips in total 
energy. The scaling of the mean round-trip time from the ground state to the maximum entropy 
state for this local-update method is found to be 0(\N log A'] 2 ) for both the ferromagnetic and the 
fully frustrated 2D Ising model with N spins. Our new algorithm thereby substantially outperforms 
flat-histogram methods such as the Wang-Landau algorithm. 
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I. INTRODUCTION 

At first-order phase transitions and in systems with 
many local minima of the free energy such as frustrated 
magnets or spin glasses, conventional Monte Carlo meth- 
ods simulating canonical ensembles have very long equi- 
libration times. Several simulation methods have been 
developed to speed up such systems, including the multi- 
canonical method 0, broad histograms Q,_parallel tem- 
pering 0, the Wang-Landau algorithm and varia- 
tions thereof p| . Most of these methods simulate a flat- 
histogram ensemble: Instead of sampling a configuration 
of energy E with Boltzmann weight w(E) oc exp(—/3E), 
they use weights w(E) oc l/g(E), where g(E) is the den- 
sity of states. The probability distribution of the energy, 
n{E) = w(E)g(E), then becomes constant, producing a 
flat energy histogram. Naively, one might assume that 
sampling all energies equally often produces an unbiased 
random walk in energy. However, it was recently shown 
that the growth with the number of spins A of the 
"tunneling times" between low and high energy in any 
local-update flat-histogram method is stronger than the 
naive TV 2 of an unbiased random walk in energy for var- 
ious 2D Ising models: as ~ N 2A for the ferromagnetic 
and ~ A 2 9 for the fully-frustrated models. For the 2D 
±J spin glass, exponential growth was observed [(|. 

In view of these results for flat-histogram simulations 
we have asked how this type of simulation can be 
improved, both in terms of computer time and statistical 
errors. The general type of application we have in mind is 
to the equilibrium behavior of a system that is very slow 
to equilibrate in a conventional simulation, such as do- 
main walls in ordered phases, low-energy configurations 
of frustrated systems or a spin-glass ordered phase. Our 
new algorithm instead simulates a broad-histogram en- 
semble, where the system can, at "equilibrium" in this 
ensemble, wander to part of its phase diagram where 
equilibration is rapid. We look specifically at histograms 
that are broad in energy, but in general another variable 
other than the energy could be used. To minimize the 
statistical errors of measurements in the energy range 
of interest, one maximizes the number of statistically- 



independent visits. For a glassy phase, the system will 
relax very little as long as it remains in that phase, so to 
get a new statistically-independent visit to the phase the 
system has to leave it and equilibrate elsewhere (usu- 
ally at high energies). Thus the quantity we want our 
simulation to maximize is the number of round trips - 
between low and high energy - per unit computer time. 
This should minimize both the simulation's equilibration 
time and the statistical errors in the low-energy regime 
of interest. 

In this paper, we present an algorithm that systemat- 
ically optimizes the ensemble simulated to maximize the 
rate of round trips in energy. We use a feedback loop that 
reweights the ensemble based on preceding measurements 
of the local diffusivity of the total energy. This detects 
the "bottlenecks" in the simulation as minima in the dif- 
fusivity (at critical points in the cases we study), and 
reallocates resources to those energies in order to mini- 
mize the slowdown. We find that the resulting statistical 
errors in the density of states as estimated by this new 
algorithm are nearly uniform in energy, in strong con- 
trast to flat-histogram simulations where the errors are 
much larger at low energy than at high energy. While 
our algorithm is rather general and should be widely ap- 
plicable to study complex systems, we have developed 
and tested it on ferromagnetic (FMI) and fully-frustrated 
(FFI) square-lattice Ising models. 



II. FEEDBACK OF LOCAL DIFFUSIVITY 

In our simulations, the system's energy does a random 
walk in the energy range between two extremal energies, 
E— < E < E + where we take the lowest energy E- to 
be the system's ground state, although this is not neces- 
sary for our approach. Consider a general ensemble, with 
weights w(E), which define the acceptance probabilities 
for moves based on the Metropolis scheme 
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Our algorithm iteratively collects data from batch runs 
which simulate with a fixed ensemble. During a simu- 
lation detailed balance is strictly satisfied at all times. 
For a reasonably large number of sweeps we can thus 
measure the equilibrium distribution of the energy in 
this ensemble which is n w (E) oc w(E)g(E). The simu- 
lated system does a biased and Markovian random walk 
in configuration space. Since we bias this walk based 
only on total energy, the projection of this random walk 
onto that variable is what we will discuss. This pro- 
jection, which ignores all properties of the state other 
than its total energy, results in a random walker that is 
non-Mar kovian, with its memory stored in the system's 
configuration. Thus the simulation may be viewed as a 
biased non-Markovian random walker moving along the 
allowed energy range between the two extremal energies. 

To measure the round trips we add a label to the walker 
that says which of the two extremal energies it has vis- 
ited most recently. The two extrema act as "reflecting" 
and absorbing boundaries for the labeled walker: e.g., if 
the label is plus, a visit to E + does not change the la- 
bel, so this is a "reflecting" boundary. However, a visit 
to E- does change the label, so the plus walker is ab- 
sorbed at that boundary. The steady-state distributions 
of the labeled walkers satisfy n-(E) + n + (E) = n w (E). 
It is important to note that the behavior of the labeled 
walker is not affected by its label except when it visits 
one of the extrema and the label changes. When the un- 
labeled walker is at equilibrium, the labeled walker is in a 
nonequilibrium steady state. Let f(E) = n + (E)/n w (E) 
be the fraction of the walkers at E that have label plus, so 
they have most recently visited E+. The above-discussed 
boundary conditions dictate f(E_) = and f(E + ) = 1. 

To calculate the rate of round trips, we note that in 
steady state the current j of the labeled walkers is in- 
dependent of E. The plus and minus walkers drift in 
opposite directions and the equilibrium unlabeled walker 
has no net current. We first examine the case of a con- 
tinuous energy E. The steady-state current from E+ to 
E- to first order in the derivative is 



j = D(E)n w (E) 



dE 



(2) 



where D{E) is the walker's diffusivity at energy E. There 
is no current if / is constant, since this is equilibrium; 
this is why the current is to leading order proportional 
to df /dE. If one rearranges the above equation and in- 
tegrates on both sides, noting that j is a constant and / 
runs from to 1, one obtains 



J 



dE 



E _ D{E)n w {E) 



(3) 



In the following we separately discuss how we can max- 
imize the rate of round-trips for Metropolis and TV-fold 
way dynamics based on this estimate of the current. 



A. Metropolis dynamics 

For Metropolis dynamics the rate of round-trips is sim- 
ply proportional to the current. To maximize the round- 
trip rate, the above integral, Eq. @, must be minimized. 
However, there is a constraint: n w (E) is a probability 
distribution and must remain normalized. We do this by 
adding a Lagrange multiplier: 



E _ \D(E)n w (E) 



(4) 



To minimize this integrand, the ensemble, that is the 
weights w(E) and thus n w (E) are varied. At this point 
we assume that the dependence of D(E) on the weights 
can be neglected. This is justified by noting that the 
rates of transitions between configurations depend only 
on the ratios of weights, so the diffusivity D(E) is un- 
changed when the weights are multiplied by an energy- 
independent constant. By ignoring the variation of D(E) 
with the weights, we are assuming that the adjustments 
to the weights are slowly- varying in energy, which is true 
for most systems, particularly for large systems where the 
energy range being studied is large. With this assump- 
tion, the optimal weighting which minimizes the above 
integrand is 



,(op*) - 



1 



^D{E)X 



dE 



(5) 



Thus for the optimal ensemble with Metropolis dynam- 
ics, the probability distribution is simply inversely pro- 
portional to the square root of the local diffusivity. 



B. A- fold way dynamics 

Since Metropolis dynamics can be slowed down by high 
rejection rates of singular moves, e.g. in the vicinity of 
the fully polarized ground state of the FMI, or the occur- 
rence of multiple, generally accepted zero-energy moves 
it can be advantageous to introduce rejection-free single- 
spin flip updates such as the A- fold way 0. A-fold 
way dynamics involve two time scales, the walker's time 
and the computer time. At a given energy level the two 
time scales differ by the (energy-dependent) lifetime of a 
given spin configuration. The random walk with A-fold 
way dynamics is an equilibrium process when measured 
in walker's time, that is the equilibrium distribution, 
n w (E), is proportional to the amount of walker's time 
the walker spends at E. However, for the ensemble opti- 
mization with A-fold way dynamics we want to speedup 
the random walk measured in computer time. This setup 
with two clocks requires a slightly different reweighting 
procedure than is presented above for Metropolis dynam- 
ics. 

As for the Metropolis dynamics the amount of walker's 
time it takes to make a round trip is proportional to 
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\jj given in Eq. (j3J. However, we are interested in 
minimizing the amount of computer time spent, so we 
need to multiply this by the ratio of computer time to 
walker's time at E which we denote as t(E). Let us 
assume the distribution n w (E) is normalized to integrate 
to one. Then for one unit of walker's time, the fraction 
spent at E in dE is n w (E)dE . The amount of computer 
time used per unit walker's time is thus 



T = 



,{E)t{E)dE 



(6) 



E- 



To find the weights that minimize the round-trip time as 
measured in computer time, the full quantity we want to 
minimize is thus 



dE 



D(E)n w (E) 



t(E')n w (E')dE' 



(J) 



Since the probability distribution n w {E) occurs in both 
the numerator and denominator of the integrand there 
is no need to enforce its normalization by a Lagrange 
multiplier. To extremize the integrand, we will again vary 
the weights w(E), which gives the following condition for 
the optimum: 



T _ t(E) 
D(E)nUE) ~ ~T 



(8) 



so the weights should be chosen to give the optimal dis- 
tribution 



Aopt) 



(E) 



D(E)t(E) 



(9) 



For the optimal ensemble with ./V-fold way dynamics, the 
probability distribution is thus larger at the points with 
smaller t(E) (since they do not cost a lot of computer 
time) and smaller diffusivity D{E). 
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FIG. 1: Histograms of the optimal ensemble for the 2D fully 
frustrated Ising model with Metropolis dynamics. For various 
system sizes and a broad energy range the rescaled data points 
collapse onto a a power-law divergence \(E — Eo)/N hi N} -1 
(bold line). The inset shows the diffusivity D(E) for the same 
model which is proportional to [(E — Eq)/N] 2 (bold line). 



In practice we work with the logarithms of the weights, 
so the reweighting becomes 



lnw'(E) = lnw(E) 
1 

+ 2 



(13) 



In 



dE 



\nn w (E) - lnt(E) 



where all energy independent terms have been dropped 
as they introduce a constant shift only. For Metropolis 
updates the last term \nt(E) can also be dropped. Each 
subsequent batch should be run significantly longer than 
the previous one - in our implementation we double the 
number of sweeps - in order to get better statistics, and 
fed back to improve the estimates of the optimal weights. 



Feedback iteration 



To feed back we simulate with some trial weights w(E), 
get steady-state data for n w (E) and f(E) and thus obtain 
estimates for the diffusivity via 

D w = -ikw ■ ( 10 ) 

n ™( E )dE 

For Metropolis dynamics chose new weights w'(E) so that 



n w ,(E)=A^/n w (E)-^ , (11) 

where A is a normalization constant whose value is not 
needed to run the next "batch" of the simulation with 
the new weights w'(E). For ./V-fold way dynamics the 
new weights w'(E) are chosen to be 

n ^ = f^a%W)- (12) 



III. IMPLEMENTATION AND APPLICATIONS 

We implemented this algorithm for 2D Ising models 
with single-spin-flip Metropolis and TV-fold way dynam- 
ics, found the optimal ensembles for the FMI and FFI 
models, obtained the scaling of round-trip times and cal- 
culated the density of states and its statistical errors for 
both models. In both cases we used the ground state, 
E- = Eq, and zero energy E + = 0, as the energy limits. 

In the initial batch mode step we simulated a flat- 
histogram ensemble for small system sizes using either 
the exact density of states Q or a rough estimate thereof 
calculated with the Wang-Landau algorithm For 
larger systems (N > 64 x 64) where the round-trip times 
for the flat-histogram ensemble are more than a magni- 
tude larger than for the optimized ensemble, we produced 
an initial estimate of the optimal weights by extrapolat- 
ing the optimized weights of smaller systems. For all 
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FIG. 2: Scaling of round-trip times in the energy interval 
[—TV, 0] for the flat-histogram (open symbols) and optimized 
ensemble (filled symbols) of the 2D fully frustrated Ising 
model with Metropolis (circles) and TV-fold way (squares) dy- 
namics. The solid lines correspond to a logarithmic (power- 
law) fit for the optimized (flat-histogram) ensemble. The in- 
set illustrates the frustrated couplings of the fully frustrated 
model. 

batch mode steps the fraction of labeled walkers, f(E), 
was determined by recording two histograms, one for the 
equilibrium (unlabeled) walker and one for the labeled 
walker which most recently visited E-. The derivative 
df I dE was then estimated by a linear regression of sev- 
eral neighboring points at each energy level. The num- 
ber of points used for the regression can be reduced for 
subsequent batch mode steps as the estimate of f(E) be- 
comes increasingly accurate due to better statistics. In 
the final batch mode steps of our calculations the regres- 
sion was performed using a minimum of three points. In 
general, there is a trade-off between the accuracy in the 
measurement of the local diffusivity and the number of 
feedback steps. For the Ising models we study we found 
rapid convergence to the optimal ensemble. For small 
systems, TV < 32 x 32, an initial batch mode step with 
some 10 5 to 10 6 sweeps was sufficient to find the optimal 
weights after the first feedback step. Since the possi- 
ble energy levels are discrete for the Ising model, special 
care is taken when applying the reweighting derived for 
the continuum limit, particularly at the boundaries of the 
energy interval [J5_, E + ]. However, we did not encounter 
any subtlety for either model. 



A. Fully frustated Ising model 

We first present our results for the fully-frustated 
model (FFI), which has a critical point at its ground 
state, and shows rather simple scaling of our algorithm's 
behavior with energy and system size. For the optimized 
ensemble of the FFI the histogram of the equilibrium 
random walker is no longer fiat, but exhibits a power-law 




E/2N 

FIG. 3: Histograms for the 2D ferromagnetic Ising model ob- 
tained after feedback: for the optimal ensemble a peak evolves 
around the critical energy in the histograms. The additional 
peak near the fully polarized groundstate found for Metropo- 
lis updates (thin line) can be eliminated by changing the dy- 
namics to TV- fold way updates (bold line). The inset shows 
the fraction f(E) of walkers which have most recently visited 
E+ = for flat-histogram (multicanonical) sampling and the 
optimal ensembles for Metropolis and TV-fold way dynamics. 



divergence at its ground state, as shown in Fig. ^ This 
divergence reflects the behavior, D(E) ~ [[E — E )/N] 2 
of the diffusivity, as is seen in the inset of Fig. ^ These 
power-law behaviors extend from the first few points, 
E-E a = 0(1), up nearly to zero energy, E—E a = O(N). 
If we accept that the critical exponent for the diffu- 
sivity is indeed 2, then the optimal distribution scales 
as ti w ~ 1/[{E — Eq)\uN], and the round-trip time 
as t ~ (TV In TV) 2 , consistent with our results shown in 
Figs. [Hand H 

Noting that for our optimized ensemble the system 
spends a large fraction of its time near the ground state 
where many Metropolis moves are rejected, we applied 
a version of our algorithm that instead uses single-spin- 
flip rejection-free TV-fold way updates. We find the TV- 
fold way updates do give a significant speedup com- 
pared to Metropolis dynamics, but do not change the 
t ~ (TV In TV) 2 scaling of the round-trip time. In com- 
parison to the performance of flat-histogram sampling 
we find a substantial speedup up to a factor of around 
50 for the largest simulated system with TV = 128 x 128 
spins, see Fig.|2 



B. Ferromagnetic Ising model 

We now turn to the results for the ferromagnetic Ising 
model (FMI) which exhibits a finite-temperature second 
order phase transition. After applying the feedback, we 
find a peak in the histogram near the critical energy, 
as shown in Fig. [31 For Metropolis updates a second 
divergence close to the fully polarized ground state ap- 
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FIG. 4: Scaling of round-trip times in the energy interval 
[— 2N, 0] for the flat-histogram ensemble (open symbols) and 
the optimal ensemble (filled symbols) of the 2D ferromag- 
netic Ising model with Metropolis (circles) and iV-fold way 
dynamics (squares). The solid (dashed) lines correspond to a 
logarithmic (power-law) fit for the optimized (flat-histogram) 
ensemble. The inset shows the scaling of histograms for the 
optimal ensemble for iV-fold way updates. 



pears which is eliminated by changing the dynamics to 
rejection- free TV-fold way moves. However, the minimum 
in the diffusivity at the critical point remains with iV-fold 
way dynamics and the resulting peak in the histogram is 
not suppressed. With increasing system size this power- 
law divergence moves towards the critical energy of the 
infinite system, E C /2N = —0.71, as illustrated in the in- 
set of Fig. For both types of single-spin-flip moves we 
find that the rate of round trips between the magneti- 
cally ordered and disordered phases of the ferromagnet 
appear to scale as t ~ (NlnN) 2 as for the FFI model, 
see Fig. 0] 



FIG. 5: Average statistical error {A\n g(E)) e of the com- 
puted density of states of the FFI model versus the number 
of round trips in the energy interval [— N, 0]. The statistical 
errors were obtained for 16 independent runs and averaged 
over all energies for N = 36 (open symbols). Data points for 
larger system sizes are superimposed (solid symbols), with the 
system size increasing from right to left. The statistical error 
reduces like 1 /ground trips (solid line) for all system sizes. 
The inset shows the deviation, 8(E) = \n g(E) — hi g exax ,^(E), 
from the exact result for the 24 x 24 FMI model obtained for 
16 independent runs with 3.2 ■ 10 6 sweeps. 

can be orders of magnitude larger at low energy than 
at high energy Q. The statistical error is found to scale 
as A In g(E) ~ 1/ ground trips with the number of round 
trips in energy which is shown in the main panel of Fig. El 
For different system sizes we find the statistical errors to 
collapse onto a single 1 / -ground trips dependence which 
a posteriori validates our goal of maximizing the rate of 
round trips. 



V. CONCLUSIONS 



IV. STATISTICAL ERRORS 

Finally we address the statistical errors of measure- 
ments performed during the simulation. Standard tools 
can be used for the error analysis as the simulated 
random walk in configuration space is a conventional 
Markov chain Monte Carlo simulation. Only the pro- 
jection of this random walk onto energy space becomes 
non-Mar kovian which is irrelevant for the measurements. 

For each batch mode step simulating a fixed statisti- 
cal ensemble w(E) we can measure the density of states, 
In g(E) = \nn w (E) — \nw(E), from the recorded equilib- 
rium distribution n w (E) . Comparing our results with the 
exact density of states we find perfect agreement within 
the statistical errors as illustrated for the FMI in the inset 
of Fig. The observed distribution of statistical errors 
is nearly flat in energy, which is a further improvement 
compared to flat-histogram simulations where the errors 



The presented algorithm should be widely applicable 
to study the equilibrium behavior of complex systems, 
such as glasses, dense fluids or polymers. To speed up the 
system's equilibration the rate of round trips in energy 
is maximized by systematically optimizing the statistical 
ensemble based on measurements of the local diffusiv- 
ity. We find that the relative statistical error in the den- 
sity of states as calculated with the new method scales 
as 0(1/ -ground trips). For the 2D ferromagnetic and 
fully frustrated Ising models the round-trip time from the 
ground state to the maximum entropy state scales like 
0([iVlog ^V] 2 ) which is a significant speedup compared 
to the power law behavior 0(N 2+Z ) of flat-histogram al- 
gorithms. 

The idea of performing round-trips in energy is simi- 
lar to the parallel tempering algorithm [j| which simu- 
lates replicas of the system at various temperatures. The 
swapping of replicas at neighboring temperatures can be 
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viewed as a random walk of the replicas along the temper- 
ature. In order to maximize the round-trips in tempera- 
ture one can use our algorithm to systematically optimize 
the simulated temperature set which we will discuss in a 
forthcoming publication [Tl| . 
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